Update environment pins; fix demo notebook; and sync to Hugging Face#6
Update environment pins; fix demo notebook; and sync to Hugging Face#6sou-cheng-choi wants to merge 15 commits intomainfrom
Conversation
alegresor
left a comment
There was a problem hiding this comment.
please do not put line breaks in README. There should be an option in you editor to toggle word wrap in order to view long lines. Line breaks are hard to maintain when text is edited.
There was a problem hiding this comment.
Pull Request Overview
This PR updates environment dependencies and adds infrastructure for uploading the LDData repository to Hugging Face Datasets Hub. The main changes support transitioning from a standalone GitHub repository to a publicly accessible dataset on Hugging Face, making low-discrepancy point set parameters more discoverable and easier to use in QMC research.
Key changes:
- Updates
qmcpyto version 2.0 andqmctoolsclto version 1.1.5 to fix broken pip installations - Adds comprehensive upload tooling (
upload.py,git_lfs_upload.sh) and GitHub Actions workflow for automated synchronization to Hugging Face - Reorganizes documentation: transforms
README.mdinto a Hugging Face dataset card and moves technical specifications toLD_DATA.md
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
env.yml |
Updates qmcpy and qmctoolscl versions, adds huggingface_hub dependency for upload functionality |
upload.py |
New Python script for uploading repository to Hugging Face Datasets Hub with retry logic and fallback mechanisms |
scripts/git_lfs_upload.sh |
New bash script for git-based uploads using git-lfs for large files |
README.md |
Transformed into a Hugging Face dataset card with usage examples, citations, and dataset structure documentation |
LD_DATA.md |
New file containing the original technical specification for low-discrepancy data formats (moved from old README) |
LICENSE.txt |
Adds Apache 2.0 license file |
.gitignore |
Adds patterns for Python cache files, VS Code settings, and script directories |
.github/workflows/sync-to-huggingface.yml |
New GitHub Actions workflow for automated synchronization to Hugging Face on push |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@alegresor @copilot @ zitterbewegung Pass CI tests now. |
|
@sou-cheng-choi I've opened a new pull request, #8, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
We should not create a wrapper on top of the huggingface_hub if we are only going to update or create a dataset. See https://huggingface.co/docs/datasets/en/upload_dataset |
|
Dataset location is at https://huggingface.co/datasets/QMCSoftware/LDData/tree/main |
alegresor
left a comment
There was a problem hiding this comment.
Thank you for working on this @sou-cheng-choi, it looks like you have done a lot!
May I suggest you break this into much smaller PRs that would be easier to review and faster to get merged? I would suggest the following
- a PR which adds the LICENSE
- a PR which removes
env.ymlin favor ofpyproject.toml - A PR which adds your enhancements to the
README.mdand adds theLD_Data.md - One which adds the HuggingFace action
For adding the HF action, I must admit I do not understand what many of your files are doing. The reporuslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository gives a MWE of how to sync data from a repo into HF. Based on their MWE, I would expect the suggested PR 4. would only add a single file to .github/workflows/ which automatically uploads the dataset to HF whenever something is pushed to a branch.
|
I will close this PR and break it into multiple PRs following your suggestions. |
|
Reopen so that I won't forget opening sub-PRs. |
pip installinenv.ymlby updating theqmctoolsclpip pin to version 1.1.5andqmcpyto 2.0.